Goto

Collaborating Authors

 bwsc problem




Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints

Simchi-Levi, David, Xu, Yunzong

arXiv.org Machine Learning

The multi-armed bandit (MAB) problem is one of the most fundamental problems in online learning, with diverse applications ranging from pricing and online advertising to clinical trails. Over the past several decades, it has been a very active research area spanning different disciplines, including computer science, operations research, statistics and economics. In a traditional multi-armed bandit problem, the learner (i.e., decision-maker) is allowed to switch freely between actions, and an effective learning policy may incur frequent switching -- indeed, the learner's task is to balance the exploration-exploitation tradeoff, and both exploration (i.e., acquiring new information) and exploitation (i.e., optimizing decisions based on up-to-date information) require switching. However, in many real-world scenarios, it is costly to switch between different alternatives, and a learning policy with limited switching behavior is preferred. The learner thus has to consider the cost of switching in her learning task.